GH-48864: [C++] Support customizing more Zstd parameters#48865
GH-48864: [C++] Support customizing more Zstd parameters#48865pitrou merged 3 commits intoapache:mainfrom
Conversation
|
|
e60e703 to
730eb94
Compare
730eb94 to
4ab207e
Compare
|
@pitrou Thank you for your detailed review! I've fixed these comments. Please take a look again. |
pitrou
left a comment
There was a problem hiding this comment.
This looks mostly good! Just a few nits.
| {5, GZipFormat::DEFLATE, -12, false}, | ||
| {-992, GZipFormat::GZIP, 15, false}}; | ||
|
|
||
| template <std::derived_from<arrow::util::CodecOptions> T> |
| ARROW_ASSIGN_OR_RAISE(auto cctx, CreateCCtx()); | ||
| size_t ret = ZSTD_compressCCtx(cctx.get(), output_buffer, | ||
| static_cast<size_t>(output_buffer_len), input, | ||
| static_cast<size_t>(input_len), compression_level_); |
There was a problem hiding this comment.
Funny that the compression level is passed a second time here, after we already gave it to ZSTD_CCtx_setParameter. I suppose it doesn't matter?
There was a problem hiding this comment.
Context will also be passed to ZSTDCompressor and it needs compression_level_. We can move auto ret = ZSTD_CCtx_setParameter(cctx.get(), ZSTD_c_compressionLevel, compression_level_) to MakeCompressor though I think the readability will decrease a bit.
There was a problem hiding this comment.
After I check the Zstd's document again, I think there is a mistake here. We need to use ZSTD_compress2 instead of ZSTD_compressCCtx because ZSTD_compressCCtx will reset other parameters. Thank you for your good attention!
/*! ZSTD_compressCCtx() :
* Same as ZSTD_compress(), using an explicit ZSTD_CCtx.
* Important : in order to mirror `ZSTD_compress()` behavior,
* this function compresses at the requested compression level,
* __ignoring any other advanced parameter__ .
* If any advanced parameter was set using the advanced API,
* they will all be reset. Only @compressionLevel remains.
*/
ZSTDLIB_API size_t ZSTD_compressCCtx(ZSTD_CCtx* cctx,
void* dst, size_t dstCapacity,
const void* src, size_t srcSize,
int compressionLevel);
/*! ZSTD_compress2() :
* Behave the same as ZSTD_compressCCtx(), but compression parameters are set using the advanced API.
* (note that this entry point doesn't even expose a compression level parameter).
* ZSTD_compress2() always starts a new frame.
* Should cctx hold data from a previously unfinished frame, everything about it is forgotten.
* - Compression parameters are pushed into CCtx before starting compression, using ZSTD_CCtx_set*()
* - The function is always blocking, returns when compression is completed.
* NOTE: Providing `dstCapacity >= ZSTD_compressBound(srcSize)` guarantees that zstd will have
* enough space to successfully compress the data, though it is possible it fails for other reasons.
* @return : compressed size written into `dst` (<= `dstCapacity),
* or an error code if it fails (which can be tested using ZSTD_isError()).
*/
ZSTDLIB_API size_t ZSTD_compress2( ZSTD_CCtx* cctx,
void* dst, size_t dstCapacity,
const void* src, size_t srcSize);There was a problem hiding this comment.
I have added a new test TEST(TestCodecMisc, ZstdLargerWindowLog) to ensure the other parameters work effectively.
dfbca3b to
84dd650
Compare
5d26717 to
698aefd
Compare
wgtmac
left a comment
There was a problem hiding this comment.
Looks good to me (I haven't gone through the test). Just left some minor comments.
| } | ||
|
|
||
| TEST(TestWriterProperties, SetCodecOptions) { | ||
| constexpr int ZSTD_c_windowLog = 101; |
There was a problem hiding this comment.
Should we use kZstdCWindowLog to be consistent with this code base?
There was a problem hiding this comment.
I suggest to keep the name style of zstd. Because I think it is more readable, although it is inconsistent with our coding style.
There was a problem hiding this comment.
I agree we can just keep this in the tests.
| } | ||
|
|
||
| TEST(TestWriterProperties, SetCodecOptions) { | ||
| constexpr int ZSTD_c_windowLog = 101; |
There was a problem hiding this comment.
I agree we can just keep this in the tests.
20955d7 to
c6c7282
Compare
4dd93e4 to
2af1945
Compare
|
@github-actions crossbow submit -g cpp |
|
Revision: 2af1945 Submitted crossbow builds: ursacomputing/crossbow @ actions-512d34afcc |
|
@HuaHuaY The new tests need to be skipped if zstd is not available: |
2af1945 to
a6b30fd
Compare
My mistake. I have added an if statement to skip the tests. @pitrou |
|
@github-actions crossbow submit cppminimal* |
|
Revision: a6b30fd Submitted crossbow builds: ursacomputing/crossbow @ actions-3fe0b907c4
|
a6b30fd to
d5c8568
Compare
|
Sorry that I have refactored a little code. Could you help me take a look and approve this PR again? @pitrou |
|
@github-actions crossbow submit cppminimal* |
|
Revision: d5c8568 Submitted crossbow builds: ursacomputing/crossbow @ actions-c4d50b6264
|
|
Thanks again @HuaHuaY ! |
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 3ed9169. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
Support customizing more Zstd parameters to allow users to optimize the performance of parquet page compression/decompression.
What changes are included in this PR?
ZstdCodecOptionsinherited fromCodecOptions.ZSTD_CCtx/ZSTD_DCtxexplicitly inZSTDCodec.Are these changes tested?
Yes.
Are there any user-facing changes?
Yes. The new class
ZstdCodecOptionsis added.